15 research outputs found
On the Utility of Model Learning in HRI
Fundamental to robotics is the debate between model-based and model-free learning: should the robot build an explicit model of the world, or learn a policy directly? In the context of HRI, part of the world to be modeled is the human. One option is for the robot to treat the human as a black box and learn a policy for how they act directly. But it can also model the human as an agent, and rely on a “theory of mind” to guide or bias the learning (grey box). We contribute a characterization of the performance of these methods under the optimistic case of having an ideal theory of mind, as well as under different scenarios in which the assumptions behind the robot's theory of mind for the human are wrong, as they inevitably will be in practice. We find that there is a significant sample complexity advantage to theory of mind methods and that they are more robust to covariate shift, but that when enough interaction data is available, black box approaches eventually dominate
Inverse Reinforcement Learning without Reinforcement Learning
Inverse Reinforcement Learning (IRL) is a powerful set of techniques for
imitation learning that aims to learn a reward function that rationalizes
expert demonstrations. Unfortunately, traditional IRL methods suffer from a
computational weakness: they require repeatedly solving a hard reinforcement
learning (RL) problem as a subroutine. This is counter-intuitive from the
viewpoint of reductions: we have reduced the easier problem of imitation
learning to repeatedly solving the harder problem of RL. Another thread of work
has proved that access to the side-information of the distribution of states
where a strong policy spends time can dramatically reduce the sample and
computational complexities of solving an RL problem. In this work, we
demonstrate for the first time a more informed imitation learning reduction
where we utilize the state distribution of the expert to alleviate the global
exploration component of the RL subroutine, providing an exponential speedup in
theory. In practice, we find that we are able to significantly speed up the
prior art on continuous control tasks
Sequence Model Imitation Learning with Unobserved Contexts
We consider imitation learning problems where the expert has access to a
per-episode context that is hidden from the learner, both in the demonstrations
and at test-time. While the learner might not be able to accurately reproduce
expert behavior early on in an episode, by considering the entire history of
states and actions, they might be able to eventually identify the context and
act as the expert would. We prove that on-policy imitation learning algorithms
(with or without access to a queryable expert) are better equipped to handle
these sorts of asymptotically realizable problems than off-policy methods and
are able to avoid the latching behavior (naive repetition of past actions) that
plagues the latter. We conduct experiments in a toy bandit domain that show
that there exist sharp phase transitions of whether off-policy approaches are
able to match expert performance asymptotically, in contrast to the uniformly
good performance of on-policy approaches. We demonstrate that on several
continuous control tasks, on-policy approaches are able to use history to
identify the context while off-policy approaches actually perform worse when
given access to history
Learning Shared Safety Constraints from Multi-task Demonstrations
Regardless of the particular task we want them to perform in an environment,
there are often shared safety constraints we want our agents to respect. For
example, regardless of whether it is making a sandwich or clearing the table, a
kitchen robot should not break a plate. Manually specifying such a constraint
can be both time-consuming and error-prone. We show how to learn constraints
from expert demonstrations of safe task completion by extending inverse
reinforcement learning (IRL) techniques to the space of constraints.
Intuitively, we learn constraints that forbid highly rewarding behavior that
the expert could have taken but chose not to. Unfortunately, the constraint
learning problem is rather ill-posed and typically leads to overly conservative
constraints that forbid all behavior that the expert did not take. We counter
this by leveraging diverse demonstrations that naturally occur in multi-task
settings to learn a tighter set of constraints. We validate our method with
simulation experiments on high-dimensional continuous control tasks
On the Utility of Model Learning in HRI
Fundamental to robotics is the debate between model-based and model-free learning: should the robot build an explicit model of the world, or learn a policy directly? In the context of HRI, part of the world to be modeled is the human. One option is for the robot to treat the human as a black box and learn a policy for how they act directly. But it can also model the human as an agent, and rely on a “theory of mind” to guide or bias the learning (grey box). We contribute a characterization of the performance of these methods under the optimistic case of having an ideal theory of mind, as well as under different scenarios in which the assumptions behind the robot's theory of mind for the human are wrong, as they inevitably will be in practice. We find that there is a significant sample complexity advantage to theory of mind methods and that they are more robust to covariate shift, but that when enough interaction data is available, black box approaches eventually dominate